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An algorithm for separating the high- and low-frequency molecular dynamics modes in Hybrid Monte Carlo 
(HMC) simulations of gauge theories with dynamical fermions is presented. The separation is based on splitting 
the pseudo-fermion action into two parts, as was initially proposed by Hasenbusch. We propose to introduce 
different evolution time-scales for each part. We test our proposal in realistic simulations of two-flavor 0(a) 
improved Wilson fermions. A speed-up of more than a factor of three compared to the standard HMC algorithm 
is observed in a typical run. 



1. Introduction. 

The numerical simulations of QCD with Wil- 
son fermions using the HMC algorithm [1] pro- 
vide a significant challenge as the quark masses 
become smaller. Any improvement to make the 
simulations faster will help to increase the overlap 
between the domain of chiral perturbation theory 
and lattice QCD, allowing for an extrapolation to 
physical quark masses [2]. 

In the standard implementation of HMC one 
introduces pseudo-fermion fields to take into ac- 
count the contribution of the fermion determi- 
nant. As the quark mass becomes lighter, the 
force induced by pseudo-fermions produces in- 
creasingly large high-frequency fluctuations. One 
is therefore forced to decrease the step-size of the 
integration scheme to keep a constant acceptance 
rate. This problem is addressed in the literature 
as "ultra-violet slowing-down" [3]. 



A possible solution of this problem is the in- 
troduction of multiple time-scales for different 
parts of the action in performing the discrete in- 
tegration of molecular dynamics (MD) equations 
of motion in the fictitious time. This approach 
was initially advocated in Ref. [4], where the au- 
thors proposed to introduce different time-scales 
for the Yang-Mills term and the pseudo-fermion 
action. Since the number of arithmetic opera- 
tions required to evaluate the pure gauge force 
is much smaller than the one needed to evaluate 
the pseudo-fermion force, one can keep a smaller 
step-size for the pure gauge part of the action. 
However for light fermions, the highest frequency 
fluctuations belong mainly to the pseudo-fermion 
action, so the approach of Ref. [4] gives only a 
moderate improvement in that case. 

In Ref. [5] it was suggested that a multiple 
time-scale scheme is efficient only if one can split 
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the action 

S = S UV + S IR (1) 

in a way to satisfy the following two criteria si- 
multaneously: 

• the force term generated by Sjjv is cheap 
to compute compared to Sir; 

• the splitting (1) mainly captures the high- 
frequency modes of the system in Sjjv and 
the low- frequency modes in Sir. 

If these criteria are met, one can keep a relatively 
large step-size for the "infra-red" part of the ac- 
tion Sir (which generates the computationally 
more expensive force term) and relax the step-size 
for the "ultra-violet" part Sjjv, while the quark 
mass is becoming smaller. 

Ref. [5] proposed to use a low-order polynomial 
approximation for mimicking the high-frequency 
modes of the pseudo-fermion action. The algo- 
rithm was tested for the 2D Schwinger model with 
Wilson fermions, producing a substantial speed- 
up in comparison with the standard HMC imple- 
mentation. 

In the present work we are testing a different 
approach. In Ref. [6] it was proposed to split 
the pseudo-fermion action into two parts, par- 
tially separating the small and large eigenvalues 
of the Dirac matrix. This splitting reduces the 
condition number of the fermion matrix, allowing 
for a larger step-size. In Refs. [7-9] the method 
was developed and successfully applied to four- 
dimensional lattice QCD with two flavors of Wil- 
son fermions. We propose to further improve this 
method by putting the two contributions of the 
pseudo-fermion action from Ref. [6] on different 
time-scales of the integration scheme. 

Our proposal was already considered by Hasen- 
busch while preparing Ref. [6], but he found 
no advantage (see Ref. [9]). Since this state- 
ment referred to tests within the two-dimensional 
Schwinger model, we have decided to repeat the 
tests for lattice QCD to see if one can profit from 
the "multiple time-scales idea" there. We have 
found that for two-flavor 0(a) improved Wilson 
fermions the introduction of different time-scales 
for the splitting chosen as in Ref. [6] , indeed gives 



some speed-up compared to the case where the 
time-scale is the same for both parts. 

In all our runs we have used the educated initial 
guess (chronological inversion method) proposed 
in Ref. [10]. This method estimates the trial so- 
lution for the matrix inversion as a linear super- 
position of a sequence of solutions in the recent 
past while performing the integration along the 
MD trajectory. (It was always checked that the 
accuracy of inversion was sufficient to make the 
solution effectively exact and keep the algorithm 
reversible.) The smaller the step-size of the MD 
integration scheme, the more efficient the chrono- 
logical inversion method is. Hence the new im- 
provements, which increase the effective step-size, 
may seem less efficient because the chronological 
guess becomes worse. Therefore, the improve- 
ment factor which we gained from splitting the 
pseudo-fermion action and introducing multiple 
time-scales is less pronounced than it would be if 
we had not used the chronological inversion. (We 
did not switch off the chronological guess because 
our tests were part of a production run.) One 
can hope to gain relatively more from the method 
tested in this paper if one is not using the edu- 
cated guess. Nevertheless, it would be naive to 
say that the chronological inversion makes things 
worse. As the quark mass decreases, the step- 
sizes for each time-scale inevitably decrease, so 
the chronological inversion method will realize its 
potential after all, making the increase of com- 
puter power requirements smoother. 

The paper is organized as follows. In sec- 
tion 2 we recall lattice QCD with 0(a) im- 
proved [11], even-odd preconditioned [12,13] Wil- 
son fermions. We also briefly describe the split- 
ting of the pseudo-fermion action proposed in 
Ref. [6] . In section 3 we discuss the multiple time- 
scales integration scheme in the HMC algorithm 
and specify the splitting of the action (the choice 
of time-scales) for the models which we are going 
to simulate. In section 4 we give the details of our 
simulations and present the results. Conclusions 
follow in section 5. 
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2. The model 

We test our proposal by simulating two-flavor 
QCD with 0(a) improved [11], even-odd precon- 
ditioned Wilson fcrmions. One of the possible 
effective actions for a standard HMC simulation 
of this theory is given in Refs. [12,13]: 

S [U, 4>\ (/>] = S G [U] + S det [U] + ^(QtQrV .(2) 

Here Sg[U] is the usual Wilson plaquette action, 
<ft , 4> are pseudo-fermion fields, 

S det [[/] = -2Trlog(l + T 00 ) , (3) 

Q = (1 + T) ee - M eo (l + T)- lM oe , (4) 

T ee (T OQ ) is the clover matrix (diagonal in coordi- 
nate space) on the even (odd) sites 

{T) aa ,bd x ) = ^ c swH<J^Tf u {x) , (5) 

and the off-diagonal parts M eo and M oe , which 
connect the even with odd and odd with even 
sites, respectively, are the usual Wilson hopping 
matrices (see Ref. [12] for further details). 

According to Ref. [6] we start to modify the 
action (2) by introducing other pseudo-fermion 
fields x\x- 

S 1 [U,<j>1,<t>]=S G [U} + S det [U} + 
tfW^Q^WU + x\W^W)' 1 X , (6) 

where W is some auxiliary matrix. The idea of 
this modification is that W, as well as QW^ 1 , 
have smaller condition numbers than the original 
matrix Q. This reduces the fluctuations of the 
HMC Hamiltonian at the end of the MD trajec- 
tory, allowing for a larger step-size in the HMC 
simulation at the same acceptance rate. 

We consider here only the following choice of 
the matrix W [9]: 

W = Q + p, (7) 

which depends on one real parameter p. 1 Up 
to the multiplication by a constant factor this 

1 Another possibility, discussed in Refs. [7-9], is 
W = Q + «75p . 

We did not consider this, although the generalization of 
our approach to this choice of the auxiliary matrix is 
straightforward. 



matrix is equivalent to the original proposal of 
Ref. [6]. 

The modification of the pseudo-fermion action 
(6) can be easily implemented, if a standard HMC 
program is already available (see Refs. [6,9] for 
details). 

3. Multiple time-scales 

The introduction of multiple time-scales for dif- 
ferent segments of the action in the HMC method 
was initially proposed by the authors of Ref. [4] . 
Following their idea, one constructs a reversible 
integrator Vm (t) for the action (1) by 

V M (r) = V IR (I) ■ 

V IR (I) , (8) 

where M is a positive integer, and the effect of 
Vq,Vuv,Vib, on the system coordinates {P, Q} 
is given by 

Vq (t) : Q^Q + tP, 

Vuv (r) : P -> P - rdS uv , 

Vm (r) : P - P - rdS IR . (9) 

This integrator effectively contains two evolution 
time-scales, t and t/M. The choice of M is 
a trade-off between the computational overhead 
from computing the force dSjjv more frequently, 
and the gain from reducing the fluctuations of the 
HMC Hamiltonian at the end of the MD trajec- 
tory. In the case M = 1 one gets an ordinary 
leap-frog integrator. 

For testing the efficiency of our approach we 
performed numerical simulations for three mod- 
els. The first model is based on the action (2). 
The other two models differ from each other by a 
different splitting (1) of the action (6): 

• Model A 

Suv = Sg[U], 

Si R = S det {U}+ ( / ) \QiQ)- 1 cj ) (10) 
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• Model B 



S UV = S G [U], 
Sir — Sdet[U] - 



(11) 



• Model C 



S uv = S G [U] + S det [U]+x\W^W)- x X , 
Sir = ^WiQ^Q^W^ (12) 

Model A is just a standard HMC algorithm 
for which the original splitting of the time-scale 
proposed by Sexton and Weingarten [4] is ap- 
plied. Model B is the modification proposed by 
Hasenbusch [6] , which was numerically studied in 
Rcfs. [8,9]. Finally, model C is our proposal for 
introducing different time-scales for the two parts 
of the pseudo-fermion action (6). 

The splitting (12) is motivated by the hypoth- 
esis that most of the high-frequency modes of the 
pseudo-fermion part of the action (6) are located 
in x (W^ W)~ l x- We also put the clover determi- 
nant Sdet[U] on the "ultraviolet" time-scale be- 
cause the force generated by it is computation- 
ally cheap. The computationally expensive term 
^WiQ^Q^WU is put on the "infra-red" time- 
scale. 

4. Simulation details and results 

We tested the approach (12) in production 
runs on a 16 3 x 32 lattice at /3 = 5.29, n = 
0.1355, c sw = 1.9192 done by the QCDSF- 
UKQCD collaboration [14], [15], [16]. These pa- 
rameters correspond to ra^jra p w 0.7. The pro- 
gram was executed on the APEmille [17] at NIC 
Zeuthen. 

For the fcrmion fields we use periodic (antiperi- 
odic) boundary conditions in the spatial (time) 
directions. A trajectory was composed of N steps 
consecutive steps (8), with the trajectory length 
equal to 1: 



N 



• T = 1 



(13) 



The linear equations appearing in the calculation 
of the fermionic force and in updating <p, <ft we 



solve by the conjugate gradient algorithm. In all 
cases the starting vector for the iterative solution 
was the linear superposition of N guess solutions 
from the recent past [10]. In all our simulations 
we kept the value N guess — 7, which was empiri- 
cally found to be close to the optimum. 

We performed one run for model A, two runs 
for model B, and a few runs for model C with dif- 
ferent values for p and M . All runs had a length of 
300 trajectories, which allowed us to get a reason- 
able estimate of the acceptance rates P acc - Our 
strategy was to try to keep the same acceptance 
rates for all runs by tuning the step-size r. 

The main goal of this study was to compare 
the efficiencies of A, B, and C. These efficiencies 
are determined by the amount of CPU-time tc pu 
required for estimates of some observables with a 
given statistical error. Since the computer time 
in simulations with dynamical fermions is mostly 
spent in the calculation of the pseudo-fermion 
force, the CPU-cost is roughly proportional to 



tcPU OC (Nq + N W ) ■ T int ■ 



(14) 



Here Nq and N\y denote the average number of 
multiplications by the matrices Q^Q and W^W, 
respectively, required for producing one MD tra- 
jectory, and Tint is the integrated autocorrelation 
time for the observable under study. Unfortu- 
nately, our computer resources did not allow us 
to estimate Tint reliably. Therfore we base our 
investigation on the hypothesis that for fixed ac- 
ceptance rates, the autocorrelation times are the 
same for the different approaches and different 
parameter sets considered in this paper, i.e. for 
all runs 



Tint OC 



1 



P 

1 ac 



(15) 



This hypothesis was confirmed by simulations of 
model B on a 8 3 x 24 lattice in Ref. [9] . 

Therefore, we measured the relative gain in 
computer time with respect to the standard HMC 
algorithm (model A) for the approach tested in 
the z-th run by the following formula: 
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Table 1 

Runs of 300 trajectories each on the 16 3 x 32 lattice at (i — 5.29, k = 0.1355, c sw — 1.9192 for the models 
of Eqs. (10), (11), (12). Here p is the parameter in the operator (7), M defines the second time-scale of 
the integration scheme (8), N steps = 1/t is the number of steps of which the trajectory with length 1 was 
composed. P acc is the acceptance rate, Nq and N w denote the average number of multiplications per 
trajectory by the matrices Q^Q and W^W, respectively. D gain denotes the speed-up factor with respect 
to the standard HMC algorithm (model A). 



model 


P 


M 


Nsteps 


Pace 


Nq 


N w 


Nq + N w 


D gain 


A 





3 


140 


0.601 


139492 





139492 


l 


B 


0.5 


3 


100 


0.599 


65951 


5233 


71184 


1.95 


B 


0.2 


3 


70 


0.664 


47214 


7378 


54592 


2.82 


C 


0.5 


3 


50 


0.547 


45160 


7687 


52847 


2.40 


C 


0.2 


3 


40 


0.663 


32659 


12373 


45032 


3.42 


C 


0.1 


3 


30 


0.603 


24932 


15938 


40870 


3.42 


C 


0.07 


3 


30 


0.640 


24512 


20738 


45250 


3.28 


C 


0.1 


5 


30 


0.733 


24622 


26235 


50857 


3.35 



(i) 

The larger the gain D^ in for the i-tb. run, the 
less computer time is required for estimating the 
observables by using the approach tested in that 
run. 

We present our results in Table 1. The follow- 
ing observations can be made: 

• Putting the two contributions of the 
pseudo-fermion part of the action (6) on dif- 
ferent time-scales of the integration scheme 
(model C) gives some additional gain in 
computer time compared to the case, where 
the time-scale is the same for both parts 
(model B). A speed-up of « 20% is ob- 
served for p — 0.5 and p = 0.2. 

• In agreement with the studies in Refs. [8,9], 
we observe that for fixed M the perfor- 
mance of the approach C is best for some 
optimal value p, which in our case is likely 
to lie in the interval p E [0.1, 0.2] for M = 3. 

• In one of the runs we increased the value of 
M from 3 to 5, while p — 0.1 was close to 
the optimal value. We kept the same step- 
size t for both runs. One sees that this 
change of M increased the acceptance rate 
P acC i but the gain in computer time D gain 
stayed almost the same (or even slightly de- 
creased) due to the computational overhead 



coming from calculating the pseudo-fermion 
force dSuv more frequently. 

• By using the approach C one achieves a 
speed-up of more than a factor three com- 
pared to the standard HMC algorithm A. 

Our computational resources did not allow for a 
further resolution of the algorithmic performance 
in the space of the parameters p, M . Probably, 
the best improvement factor which we obtained, 
Dgain = 3.42, can still be slightly increased by 
further tuning of the parameters. However, we 
notice that D gain seems to be quite stable for 
some range of the parameter p. Therefore, we ex- 
pect that not much tuning of the algorithm will 
be required in the forthcoming production runs. 

5. Conclusions 

In Refs. [6-9] it was suggested to accelerate 
the HMC simulation of dynamical fermions by 
splitting the fermion matrix into two factors with 
smaller condition numbers than that of the origi- 
nal matrix, and introducing pseudo-fermion fields 
for each of the factors. Inspired by the proposal of 
Rcf. [5] , we tested the possibility to further speed 
up the simulations by putting each part of this 
new pseudo-fermion action on a separate time- 
scale. We have found that such a strategy gave 



G 



a speed-up of « 20% in comparison to the case, 
where the time-scale was the same for both parts. 

In our simulations, which are a part of the 
production runs of the QCDSF collaboration, we 
have found a reduction of the numerical cost of 
more than a factor three compared to the stan- 
dard HMC algorithm. One can speculate that 
this reduction would have been even higher if we 
had not used the educated chronological guess 
[10]. One can also expect that the gain in com- 
puter time will grow as the quark mass decreases. 

Further work in the direction of algorithmic im- 
provement can be done by testing more compli- 
cated integration schemes than that of Eq. (8). 
In Rcf. [9] it was shown that the splitting of the 
pseudo-fermion action (6) provided more compu- 
tational gain for the partially improved integra- 
tion scheme suggested in Eq. (6.4) of Ref. [4] than 
for the standard leap-frog scheme. It may be in- 
teresting to check the compatibility of that in- 
tegration scheme (and higher order integration 
schemes) with the multiple time-scales approach 
studied in our paper. 

When going to smaller quark masses, a possi- 
bility might be to generalize the idea of Ref. [6] by 
splitting the pseudo-fermion action into three or 
more parts [9]. One can introduce different time- 
scales for each part of the pseudo-fermion action 
in such an approach, profiting even more from the 
separation of high- and low-frequency modes. 
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